Validating genomic offset predictions with mortality and height data from common gardens
Published
January 16, 2025
1 Introduction
In this report, we evaluate whether genomic offset (GO) predictions from the different GEA methods (LFMM, GF, GDM and RDA) and SNP sets (control, candidate and all SNPs for LFMM) are good predictors of fitness proxies (height and mortality data) from five clonal common gardens located in Spain (Asturias, Madrid, Cáceres), Portugal (Fundão) and France (Pierroton). For that, we predicted the genomic offset of each population when transplanted in the environment of the common garden (so instead of predicting the genomic offset into future climates, we predict it in the new environment of the common garden, i.e. space-for-time approach). The rational is that populations with higher genomic offset in a given common garden are expected to have lower relative fitness in the common garden.
We also calculate the climatic transfer distances (CTD) between the climate-of-origin of the populations and the climate of the common garden (i.e. the absolute value of the difference between the climate of origin of the populations and the climate in the common garden between the tree planting date and the measurement date). We then estimate the association between tree height and mortality and the climatic transfer distances. The climatic transfer distances are calculated for the same set of climatic variables as the one used to calculate the genomic offset (one CTD per climatic variable).
2 Data
2.1 Genomic offset predictions
We load the genomic offset predictions estimated from the different GEAs.
# selected climatic variablesclim_var <-readRDS(here("data/ClimaticData/NamesSelectedVariables.rds"))# climatic data in the common gardens (btw planting and measurement dates)cg_clim <-readRDS(here("data/ClimaticData/CommonGardens/ClimateCG.rds")) %>% dplyr::select(cg,any_of(clim_var)) # Loading point estimate climatic data at the location of the populations (reference climatic period)adj <-"noADJ"# not adjusted for elevationref_period <-"ref_1901_1950"# reference period 1901-1950clim_past <-readRDS(here(paste0("data/ClimaticData/MaritimePinePops/ClimatePopulationLocationPointEstimates_ReferencePeriods_",adj,".rds")))[[ref_period]]$ref_means %>% dplyr::select(pop,any_of(clim_var))# Preparing the dataset for clim_dist <- clim_pastdf <-lapply(cg_clim[["cg"]], function(x){# Calculating the CTD for each climatic variablefor(var in clim_var){clim_dist[[var]] <- ( clim_past[[var]] - cg_clim %>%filter(cg == x) %>%pull(var) ) %>%abs()} #names(clim_dist) <- c("pop", str_c(names(clim_past[,-1]),"_CTD"))# Calculating the Euclidean distances for all climatic variableslist_scaled_clim_df <-generate_scaled_clim_datasets(clim_var, clim_pred = cg_clim)cg_clim_i <- list_scaled_clim_df$clim_pred %>%filter(cg == x) %>% dplyr::select(-cg)clim_dist$EucliDist <-map_dbl(1:nrow(list_scaled_clim_df$clim_ref[,-1]),~sqrt(sum((list_scaled_clim_df$clim_ref[.x,-1] - cg_clim_i)^2)))return(clim_dist)}) %>%setNames(cg_clim[["cg"]]) %>%bind_rows(.id="cg") %>%pivot_longer(cols=c(any_of(clim_var),"EucliDist"),names_to="input_code",values_to="varX") %>%mutate(method ="CTD",method_input_code =paste0("CTD_",input_code),input_name =case_when(input_code =="bio1"~"Mean annual temperature (bio1, °C)", input_code =="bio12"~"Annual precipitation (bio12, mm)", input_code =="bio15"~"Precipitation seasonality (bio15, index)", input_code =="bio3"~"Isothermality (bio3, index)", input_code =="bio4"~"Temperature seasonality (bio4, °C)", input_code =="SHM"~"Summer heat moisture index (SHM, °C/mm)", input_code =="EucliDist"~"All climatic variables (Euclidean distance)"),method_input_code =paste0("CTD_", input_code),method_input_name =paste0("CTD - ", input_name)) %>%bind_rows(df)
2.3 Correlation CTDs vs GO predictions
We calculate the correlations among GO predictions and climatic transfer distances in the common gardens.
Code
lapply(unique(df$cg), function(site_i){df_corr <- df %>% dplyr::filter(cg==site_i) %>% dplyr::select(method_input_code, varX,pop) %>%pivot_wider(values_from = varX, names_from = method_input_code)# Corrplot with ggplot# ====================correlations <- df_corr %>% dplyr::select(-pop) %>% cor# Order variables alphabeticallycorrelations <- correlations[order(rownames(correlations)), order(colnames(correlations))]# Melt the correlation matrix into a long formatcor_data <- reshape2::melt(correlations)# Filter to only show the lower trianglecor_data <- cor_data %>%filter(as.numeric(Var1) >as.numeric(Var2))# Plotggplot(cor_data, aes(Var1, Var2, fill = value)) +geom_tile(color ="white") +scale_fill_gradient2(low ="#BB4444", mid ="#FFFFFF", high ="#4477AA", midpoint =0, limits =c(-1, 1), name="Correlation") +geom_text(aes(label =round(value, 2)), color ="black", size =3) +theme_minimal() +theme(axis.text.x =element_text(size=11,angle =45, hjust =1),axis.text.y =element_text(size=11)) +labs(x =NULL, y =NULL)# Corrplot with make_corrplot_pdf# ===============================fig_options <-list(path =here(paste0("figs/ValidationCommonGarden/Corrplot_GOpreds_CTDs_",site_i,".pdf")),width=10,height=8)make_corrplot_pdf(df = df_corr,variables =colnames(df_corr)[-c(1:2)],text_size =0.5,number_size =0.4,fig_options = fig_options)})
2.4 Phenotypic data
We load the survival and mortality data from the five common gardens.
Code
pheno_data <-readRDS(file=here("data/CommonGardenData/PhenoDataNovember2019_AnnualTraits_UpdatedSept2021_AllSites.rds")) %>% dplyr::rename(pop=prov)no_nas <-sapply(pheno_data, function(x) length(x)-sum(is.na(x)))list_pheno <-list()list_pheno$`Asturias, Spain (37 months)`<-table(pheno_data$site,pheno_data$AST_survmar14)["asturias",]list_pheno$`Bordeaux, France (85 months)`<-table(pheno_data$site,pheno_data$BDX_surv18)["bordeaux",]list_pheno$`Cáceres, Spain (8 months)`<-table(pheno_data$site,pheno_data$CAC_survdec11)["caceres",]list_pheno$`Madrid, Spain (13 months)`<-table(pheno_data$site,pheno_data$MAD_survdec11)["madrid",]list_pheno$`Fundão, Portugal (27 months)`<-table(pheno_data$site,pheno_data$POR_survmay13)["portugal",]df_exp_design <- list_pheno %>%bind_rows(.id="cg") %>%setNames(c("Common garden (tree age)","Number of dead trees","Number of trees alive")) %>%mutate("Number of height measurements"=c(no_nas[["AST_htmar14"]], no_nas[["BDX_htnov18"]], no_nas[["CAC_htdec11"]], no_nas[["MAD_htdec11"]], no_nas[["POR_htmay13"]]))saveRDS(df_exp_design, file =here("tables/ExpDesign_CG.rds"))df_exp_design %>% kable_mydf
Common garden (tree age)
Number of dead trees
Number of trees alive
Number of height measurements
Asturias, Spain (37 months)
206
3976
3976
Bordeaux, France (85 months)
119
3225
3217
Cáceres, Spain (8 months)
3818
366
340
Madrid, Spain (13 months)
3138
1046
1046
Fundão, Portugal (27 months)
1517
2666
2665
Height and survival measurements in each common garden:
Asturias (Spain) in March 2014 when the trees were 37 month-old (trees were planted in February 2011).
Bordeaux (France) in November 2018 when the trees were 85 month-old (trees were planted in October 2011).
Cáceres (Spain) in December 2011 when the trees were 8 month-old (trees were planted in April 2011). Note that for this common garden, we calculate the bioclimatic variables for the entire year 2011 (instead of calculating the variables only for months between April and December). Indeed, the calculation of the annual bioclimatic variables will be wrong if we do not account for some months, e.g. the mean annual temperature will be higher than expected because we do not account for some winter months.
Madrid (Spain) in December 2011 when the trees were 13 month-old (trees were planted in November 2010).
Fundão (Portugal) in May 2013 when the trees were 27 month-old (trees were planted in February 2011)
3 Mortality models
In this section, we want to determine whether genomic offset (GO) or climate transfer distances (CTD) are associated with the proportion of dead trees in the populations, independently in the five common gardens. We build a model that assumes that the initial sapling height acts as a confounder. Indeed, trees that were higher at the time of planting have a higher probability of survival. This is particularly true in Madrid and Cáceres (Spain) where the seedlings experienced an extreme drought event the same year they were planted, resulting in very high mortality rates. Here is the model:
with \(a_{p}\) the count of individuals that died in the population \(p\), \(N_{p}\) the total number of individuals in the population \(p\) (=number of individuals that were initially planted in the common garden), \(p_p\) is the estimated probability of mortality in the population \(p\), \(X_{p}\) is the genomic offset or climatic transfer distance for the population \(p\) and \(H_{p}\) is the BLUPs for height of the population \(p\)(population varying intercepts calculated across all common gardens in the model 1 of Archambeau et al. 2022). \(H_{p}\) is used as a proxy of the initial tree height.
We load the population-specific intercepts from the model 1 of Archambeau et al. (2022).
Code
mod_arch2022 <-readRDS(file=here("data/Archambeauetal2022_MOD1.rds"))pop_heights <- mod_arch2022$fit %>% broom.mixed::tidyMCMC(estimate.method ="mean",conf.int = T) %>%# we take the mean of the prov random interceptsfilter(str_detect(term, "^(r_prov\\[)")) %>% dplyr::rename(height=estimate,pop=term) %>%mutate(pop=str_sub(pop,8,-12))pop_heights[1:5,] %>% kable_mydf
pop
height
std.error
conf.low
conf.high
ALT
0.09
0.04
0.01
0.17
ARM
0.12
0.04
0.04
0.20
ARN
-0.09
0.03
-0.16
-0.03
BAY
-0.14
0.03
-0.20
-0.07
BON
-0.04
0.04
-0.12
0.04
DRYAD REPOSITORY: We export the height intercepts from Archambeau et al. (2022) to the DRYAD repository: HeightIntercepts_Archambeauetal2022.csv.
The response variable is counts of dead trees. To calculate these counts, we load the survival data in the common gardens, in which 0 corresponds to dead trees and 1 to survivors.
S4 class stanmodel 'anon_model' coded as follows:
data {
int N;
int nb_tot[N]; // Total number of trees in the population
int nb_dead[N]; // Number of dead trees in the population
vector[N] H; // Mean tree height of the population
vector[N] X; // Genomic offset or climatic transfer distance of the population
}
parameters {
real beta_0;
real beta_H;
real beta_X;
}
model {
nb_dead ~ binomial_logit(nb_tot,beta_0 + beta_H * H + beta_X * X);
beta_0 ~ normal(0,5);//std_normal();
beta_H ~ normal(0,5);//std_normal();
beta_X ~ normal(0,5);//std_normal();
}
// generated quantities{
// vector[N] log_lik;
// // log likelihood for loo
// for (n in 1:N) {
// log_lik[n] = binomial_logit_lpmf( nb_dead[n] | nb_tot[n] , beta_0 + beta_H * H[n] + beta_X * X[n]);
// }
// }
We run the models for each common garden and each genomic offset predictions/climatic transfer distances, so a total of 5 * 15 = 75 models runs.
Code
coefftab <-lapply(unique(survival_data$site),function(site_i){lapply(unique(df$method_input_code), function(method_input_code_i){# Subset the data for the site i and method isub_data <- survival_data %>%filter(site == site_i) %>%group_by(pop) %>% dplyr::summarise(nb_dead=n()-sum(survival),nb_tot=n()) # transform survival data into mortality datasub_data <- df %>%filter(method_input_code == method_input_code_i & cg == site_i) %>%inner_join(sub_data, by="pop") %>%inner_join(pop_heights %>% dplyr::select(any_of(c("height", "pop"))), by="pop") %>%arrange(pop)# Data in a list for Stan stanlist <-list(N =nrow(sub_data),nb_dead = sub_data$nb_dead,nb_tot = sub_data$nb_tot,H = (sub_data$height-mean(sub_data$height)) /sd(sub_data$height),X = (sub_data$varX -mean(sub_data$varX)) /sd(sub_data$varX))# Running the modelmod <-sampling(stancode, data = stanlist, iter =2000, chains =4, cores =4, save_warmup =FALSE) # Save the model and the stanlistlist(mod = mod, stanlist = stanlist) %>%saveRDS(file=here(paste0("outputs/ValidationCommonGarden/MortalityModels/stan_models/", site_i,"_",method_input_code_i,".rds"))) })})
Code
coefftab <-lapply(unique(survival_data$site),function(site_i){lapply(unique(df$method_input_code), function(method_input_code_i){# Subset the data - keeping only one set of method / SNP set sub_data <- df %>%filter(method_input_code == method_input_code_i & cg == site_i) mod <-readRDS(file=here(paste0("outputs/ValidationCommonGarden/MortalityModels/stan_models/", site_i,"_",method_input_code_i,".rds")))[["mod"]]# Save coefficients broom.mixed::tidyMCMC(mod,droppars =NULL, robust =FALSE, # give mean and standard deviationess = F, rhat = F, conf.int = T,conf.level =0.95) %>%filter(str_detect(term, c('beta'))) %>% dplyr::rename(mean=estimate,std_deviation=std.error,conf_low=conf.low,conf_high=conf.high) %>%mutate(cg = site_i,method_input_code = method_input_code_i,method_input_name =unique(sub_data$method_input_name),method =unique(sub_data$method),input_name =unique(sub_data$input_name),input_code =unique(sub_data$input_code),.before=1) }) %>%bind_rows()}) %>%bind_rows()coefftab %>%saveRDS(file=here("outputs/ValidationCommonGarden/MortalityModels/coefftab.rds"))
Below, we plot the mean and 95% credible intervals of:
the \(\beta_X\) coefficients, which stand for the association between the counts of dead trees and the genomic offset predictions (i.e. capturing the potential maladaptation of the populations when transplanted in the new environment of the common gardens) or the climatic transfer distances.
the \(\beta_H\) coefficients, which stand for the association between the counts of dead trees and the BLUPs for height in the five common gardens, used as a proxy of the initial seedling height (a confounder in the model).
Graph titles include the time in months corresponding to the age at which height and survival were recorded. Coefficients in the green area have the expected sign, reflecting higher mortality rates in populations with higher GO predictions.
Code
coeff_match <-list(beta_H ="Regression coefficients $\\beta_H$ in mortality models",beta_X ="Regression coefficients $\\beta_X$ in mortality models")p <-lapply(c("beta_X","beta_H"), function(coeff){p <- coefftab %>%filter(term == coeff) %>%mutate(cg_name =case_when(cg =="asturias"~"Asturias, Spain (37 months)", cg =="bordeaux"~"Bordeaux, France (85 months)", cg =="caceres"~"Cáceres, Spain (8 months)", cg =="madrid"~"Madrid, Spain (13 months)", cg =="portugal"~"Fundão, Portugal (27 months)")) %>%mutate(input_name =factor(input_name, levels =c(unique(coefftab$input_name)[[13]],unique(coefftab$input_name)[[11]],unique(coefftab$input_name)[[12]],unique(coefftab$input_name)[[10]],unique(coefftab$input_name)[[8]],unique(coefftab$input_name)[[9]],unique(coefftab$input_name)[[1]],unique(coefftab$input_name)[[2]],unique(coefftab$input_name)[[3]],unique(coefftab$input_name)[[4]],unique(coefftab$input_name)[[5]],unique(coefftab$input_name)[[6]],unique(coefftab$input_name)[[7]])),method =factor(method, levels =c(unique(coefftab$method)[[2]],unique(coefftab$method)[[3]],unique(coefftab$method)[[4]],unique(coefftab$method)[[5]],unique(coefftab$method)[[6]],unique(coefftab$method)[[1]])),cg_name =factor(cg_name, levels =c("Madrid, Spain (13 months)","Cáceres, Spain (8 months)","Asturias, Spain (37 months)","Bordeaux, France (85 months)","Fundão, Portugal (27 months)")))# Plots with CTD and SNP sets# ===========================p_allvar <- p %>%ggplot(aes(x = method,y = mean,ymin = conf_low, ymax = conf_high,color = input_name,shape = input_name)) + {if(coeff=="beta_X") geom_rect(inherit.aes=FALSE, aes(xmin=-Inf, xmax=Inf, ymin=0, ymax=Inf), color="transparent", fill="green", alpha=0.005)} +geom_hline(yintercept =0,color="gray") +geom_pointinterval(position =position_dodge(width =0.6),point_size=3, size=2) +facet_wrap(~cg_name, ncol=2) +ylab(TeX(coeff_match[[coeff]])) +xlab("") +scale_color_manual(values=colors_coeff)+scale_shape_manual(values =c(rep(17,6),rep(16,7))) +theme_bw() +labs(color="",shape="") +theme(axis.text.x =element_text(size=13),axis.text.y =element_text(size=13),axis.title.y =element_text(size=16),legend.title =element_text(size=13), strip.text.x =element_text(size =16),strip.background =element_blank(),legend.position =c(0.77,0.15),legend.text=element_text(size=12),panel.grid.minor.x=element_blank(),panel.grid.major.x=element_blank()) +guides(color =guide_legend(ncol=1),shape =guide_legend(override.aes =list(size =2 )))# save in pdf and pngggsave(p_allvar,file=here(paste0("figs/ValidationCommonGarden/MortalityModels/",coeff,"_SNPsandCTD.pdf")),device="pdf",height=11,width=13)ggsave(p_allvar,file=here(paste0("figs/ValidationCommonGarden/MortalityModels/",coeff,"_SNPsandCTD.png")),height=11,width=13)# Plots with SNPs only# ====================p_SNPs <- p %>%filter(!method =="CTD") %>%ggplot(aes(x = method,y = mean,ymin = conf_low, ymax = conf_high,color = input_name,shape = input_name)) + {if(coeff=="beta_X") geom_rect(inherit.aes=FALSE, aes(xmin=-Inf, xmax=Inf, ymin=0, ymax=Inf), color="transparent", fill="green", alpha=0.007)} +geom_hline(yintercept =0,color="gray") +geom_pointinterval(position =position_dodge(width =0.45),point_size=3, size=2) +# facet_wrap(~cg_name, ncol=2) +ylab(TeX(coeff_match[[coeff]])) +xlab("") +scale_color_manual(values=colors_coeff) +scale_shape_manual(values =c(rep(17,6),rep(16,6))) +theme_bw() +labs(color="",shape="") +theme(axis.text.x =element_text(size=13),axis.text.y =element_text(size=13),axis.title.y =element_text(size=16),axis.title.x =element_text(size=1),legend.title=element_text(size=13), strip.text.x =element_text(size =16),strip.background =element_blank(),legend.position =c(0.77,0.15),legend.text=element_text(size=14),panel.grid.minor.x=element_blank(),panel.grid.major.x=element_blank()) +guides(color=guide_legend(ncol=1),shape =guide_legend(override.aes =list(size =2 )))# save in pdf and pngggsave(p_SNPs,file=here(paste0("figs/ValidationCommonGarden/MortalityModels/",coeff,"_SNPsets.pdf")),device="pdf",height=11,width=13)ggsave(p_SNPs,file=here(paste0("figs/ValidationCommonGarden/MortalityModels/",coeff,"_SNPsets.png")),height=11,width=13)# Figure in the main manuscript################################ We want to add some images to represent the climatic differences among common gardensif(coeff=="beta_X"){annotation_custom2 <-function (grob, xmin =-Inf, xmax =Inf, ymin =-Inf, ymax =Inf, data){ layer(data = data, stat = StatIdentity, position = PositionIdentity, geom = ggplot2:::GeomCustomAnn,inherit.aes =TRUE, params =list(grob = grob, xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax))}df_png <- p %>% dplyr::select(cg_name) %>% distinct %>% dplyr::mutate(image =case_when(cg_name =="Asturias, Spain (37 months)"~"reports/cloud.png", cg_name =="Bordeaux, France (85 months)"~"reports/cloud.png", cg_name =="Cáceres, Spain (8 months)"~"reports/sun.png", cg_name =="Madrid, Spain (13 months)"~"reports/sun.png", cg_name =="Fundão, Portugal (27 months)"~'reports/cloud_and_sun.png'))list_pngs <-lapply(unique(p$cg_name), function(cg_name_i){sub <- p %>%filter(cg_name==cg_name_i) %>%slice(1)png_cg =annotation_custom2(rasterGrob(readPNG(here(df_png[df_png$cg_name==cg_name_i,"image"])),interpolate=TRUE), ymin =-0.54,ymax=-0.24,xmin =0.42,xmax =1, data=sub)})p_allvar_images <- p_allvar + list_pngs[[1]]+ list_pngs[[2]]+ list_pngs[[3]]+ list_pngs[[4]]+ list_pngs[[5]]# save in pdf and pngggsave(p_allvar_images,file=here(paste0("figs/ValidationCommonGarden/MortalityModels/",coeff,"_SNPsets_WithCloudAndSunImages.pdf")),device="pdf",height=11,width=13)}return(p_allvar)})
Warning: A numeric `legend.position` argument in `theme()` was deprecated in ggplot2 3.5.0.
ℹ Please use the `legend.position.inside` argument of `theme()` instead.
Warning in geom_rect(inherit.aes = FALSE, aes(xmin = -Inf, xmax = Inf, ymin = 0, : All aesthetics have length 1, but the data has 185 rows.
ℹ Please consider using `annotate()` or provide this layer with data containing a single row.
All aesthetics have length 1, but the data has 185 rows.
ℹ Please consider using `annotate()` or provide this layer with data containing a single row.
Warning in geom_rect(inherit.aes = FALSE, aes(xmin = -Inf, xmax = Inf, ymin = 0, : All aesthetics have length 1, but the data has 150 rows.
ℹ Please consider using `annotate()` or provide this layer with data containing a single row.
All aesthetics have length 1, but the data has 150 rows.
ℹ Please consider using `annotate()` or provide this layer with data containing a single row.
Warning in geom_rect(inherit.aes = FALSE, aes(xmin = -Inf, xmax = Inf, ymin = 0, : All aesthetics have length 1, but the data has 185 rows.
ℹ Please consider using `annotate()` or provide this layer with data containing a single row.
Code
p
[[1]]
Warning in geom_rect(inherit.aes = FALSE, aes(xmin = -Inf, xmax = Inf, ymin = 0, : All aesthetics have length 1, but the data has 185 rows.
ℹ Please consider using `annotate()` or provide this layer with data containing a single row.
[[2]]
Interpretation
As expected, the association between mortality rates and the initial tree height was particularly strong in Madrid and Cáceres (Spain) where the seedlings experienced an extreme drought event the same year they were planted, resulting in very high mortality rates. More surprisingly, this association was also osberved in Fundão, Portugal. However, the initial tree height was not associated with mortality rates in Bordeaux (France) and Asturias (Spain), which benefit from the favorable climates of the Atlantic region and in which the mortality rates were very low.
In Fundão, Cáceres and Madrid, for most genomic offset predictions, populations experiencing higher mortality rates also showed higher genomic offset. The most consistent associations across the three common gardens were obtained with:
Gradient Forest (GF) with both candidate and control SNPs.
Redundancy analysis (RDA) with both candidate and control SNPs.
Latent factor mixed model (LFMM) with all SNPs or control SNPs.
Interestingly, climatic transfer distances generally explain mortality probability less well than genomic offset predictions, and none of them show a consistent association with the number of dead trees across the three common gardens. Note that it is almost the case for the climatic transfer distance calculated based on the temperature seasonality, which shows a positive association with the counts of dead trees in Madrid and Fundão, and almost in Cáceres.
In Asturias, no association was detected between genomic offset predictions and mortality rates, which may be due to the favorable climatic conditions of this common garden and the associated low mortality rates (only 206 dead trees). Therefore, in this common garden, mortality events are likely to be mostly random events due to other factors than climate. However, we can note that an association was detected between mortality rates and isothermality: populations from areas with high isothermality tend to die more in the common garden of Asturias, Spain.
In Bordeaux, there were even less dead trees than in Asturias (119 dead trees). However, the two nonparametric approach used to predict the genomic offset (GDM and GF) detected a positive association between mortality rates and the genomic offset predictions obtained with both the candidate and the control SNPs. A positive association was also detected with the genomic offset predictions obtained with candidate SNPs with the LFMM approach. On the other hand, none of the genomic offset predictions from the RDA approach were associated with the mortality rates. Interestingly, the only association with a climatic transfer distance was with the temperature seasonality (bio4), which was the most important variable to explain the turnover in allele frequency in the GF and GDM approaches.
3.4 Predictions of tree mortality probability
3.4.1 Plots
Let’s visualize the uncertainty around the estimation of the probability of tree mortality \(p\).
Code
##################################################################################################### Visualizing the relationships between GO predictions or CTDs and mortality probability predictions####################################################################################################coefftab <-readRDS(file =here("outputs/ValidationCommonGarden/MortalityModels/coefftab.rds")) %>%mutate(cg_name =case_when(cg =="asturias"~"Asturias, Spain (37 months)", cg =="bordeaux"~"Bordeaux, France (85 months)", cg =="caceres"~"Cáceres, Spain (8 months)", cg =="madrid"~"Madrid, Spain (13 months)", cg =="portugal"~"Fundão, Portugal (27 months)"))lapply(unique(coefftab$cg), function(site_i){lapply(unique(coefftab$method_input_code), function(method_input_code_i){ sub_coefftab <- coefftab %>%filter(method_input_code == method_input_code_i & cg == site_i)# We load the stanlist stanlist <-readRDS(file =here(paste0("outputs/ValidationCommonGarden/MortalityModels/stan_models/", site_i,"_",method_input_code_i,".rds")))$stanlist# We load the model mod <-readRDS(file =here(paste0("outputs/ValidationCommonGarden/MortalityModels/stan_models/", site_i,"_",method_input_code_i,".rds")))$mod# We extract the posterior distributions of all parameters post <-as.data.frame(mod)# Vector of predictor values (based on the min and max of the GO predictions) x_vec <-seq(min(stanlist$X),max(stanlist$X),0.05)# Function to predict the mortality probability p with the initial tree height fixed to the mean f_p <-function(x) 1/ (exp(-(post$`beta_0`+ post$`beta_H`*mean(stanlist$H) + post$`beta_X`* x)) +1) p_pred <-sapply(x_vec, f_p) %>% tibble::as_tibble() %>%rename_all(function(x) x_vec) %>%mutate(Iter =row_number()) %>%gather(x, p, -Iter) %>%group_by(x) %>%mutate(hpdi_l = HDInterval::hdi(p, credMass =0.90)[1],hpdi_h = HDInterval::hdi(p, credMass =0.90)[2],p_mean =mean(p)) %>%ungroup() %>%mutate(x =as.numeric(x))# Keep mean and 90% HDPI of p (one value for each iteration) p_mean_df <- p_pred %>% dplyr::select(x,hpdi_l,hpdi_h,p_mean) %>% dplyr::distinct()# Plots#=======# Plot options y_limits <-c(0,1)if(unique(sub_coefftab$method) =="CTD"){x_axis <-"Mean-centered CTD"} else {x_axis <-"Mean-centered GO predictions"} y_axis <-"Probability of tree mortality" p <-ggplot() +ylim(y_limits) +labs(y=y_axis, x=x_axis) +theme_bw()# First 100 posterior draws of the mortality probability p p1 <- p +geom_point(data = p_pred %>%filter(Iter <101),aes(x, p), alpha = .2, color ='dodgerblue')# Mean (line) and 90% HPDI (shaded region) of the mortality probability p p2 <- p +geom_ribbon(data = p_mean_df,aes(x = x, ymin = hpdi_l, ymax = hpdi_h),alpha = .2,fill='dodgerblue') +geom_line(data = p_mean_df,aes(x=x, y=p_mean), col="dodgerblue4") p1_p2 <-ggarrange(p1, p2 +labs(y=""), nrow =1)# Title title <-ggdraw() +draw_label(paste0(unique(sub_coefftab$cg_name)," - ",unique(sub_coefftab$method_input_name)),fontface ='bold',x =0, hjust =0) +theme(plot.margin =margin(0, 0, 0, 7))# merge title and plots p1_p2 <-plot_grid(title, p1_p2, ncol =1, rel_heights =c(0.1, 1))ggsave(p1_p2,file=here(paste0("figs/ValidationCommonGarden/MortalityModels/MortalityProbabilityPredictions_UncertaintyIntervals/", method_input_code_i,"_",site_i,".pdf")),device="pdf",height=6,width=11)})})
lapply(unique(survival_data$site),function(site_i){# Subset the data for the site i and add the initial heightssub_data <- survival_data %>%filter(site == site_i) %>%group_by(pop) %>% dplyr::summarise(nb_dead=n()-sum(survival),nb_tot=n()) %>%# transform survival data into mortality datainner_join(pop_heights %>% dplyr::select(any_of(c("height", "pop"))), by="pop") %>%arrange(pop)# Data in a list for Stan stanlist <-list(N =nrow(sub_data),nb_dead = sub_data$nb_dead,nb_tot = sub_data$nb_tot,H = (sub_data$height-mean(sub_data$height)) /sd(sub_data$height))# Running the modelmod <-sampling(stancode, data = stanlist, iter =2000, chains =4, cores =4, save_warmup =FALSE) # Save the model and the stanlistlist(mod = mod, stanlist = stanlist) %>%saveRDS(file=here(paste0("outputs/ValidationCommonGarden/MortalityModels/stan_models/", site_i,"_ModelWithoutGO.rds")))})
Code
corrtab <-lapply(unique(survival_data$site),function(site_i){ list_mod <-readRDS(file=here(paste0("outputs/ValidationCommonGarden/MortalityModels/stan_models/",site_i,"_ModelWithoutGO.rds")))# Extract posterior samples posterior_samples <- rstan::extract(list_mod[[1]])# Extract parameters (posterior samples for beta_0 and beta_H) beta_0_samples <- posterior_samples$beta_0 beta_H_samples <- posterior_samples$beta_H# Mean tree height for each population (H) H <- list_mod[[2]]$H# Generate posterior predictions for each population N <- list_mod[[2]]$N # number of populations num_samples <-length(beta_0_samples) # Number of posterior samples# Initialize matrix to store predicted probabilities mortality_prob <-matrix(NA, nrow=num_samples, ncol=N)# Compute mortality probability for each population for each posterior samplefor (i in1:num_samples) {for (j in1:N) {# Logistic transformation to get probability of mortality mortality_prob[i, j] <-1/ (1+exp(-(beta_0_samples[i] + beta_H_samples[i] * H[j]))) } }# Summary of posterior predictive mortality probabilities for each population mortality_prob_summary <-apply(mortality_prob, 2, function(x) c(mean=mean(x), sd=sd(x), quantile(x, probs=c(0.025, 0.975))))# Convert to a readable data frame (optional) mortality_prob_df <-tibble(pop =unique(df$pop),mean_prob = mortality_prob_summary["mean", ],sd_prob = mortality_prob_summary["sd", ],lower_95 = mortality_prob_summary["2.5%", ],upper_95 = mortality_prob_summary["97.5%", ],prob_obs = list_mod[[2]]$nb_dead / list_mod[[2]]$nb_tot, # Observed proportion of dead treesdiff_p = prob_obs - mean_prob, # diff bwt obs prob. and estimated prob.cg = site_i ) #%>% #arrange(diff_p) %>% #mutate(diff_p_rank = 1:nrow(.))lapply(unique(df$method_input_code), function(method_input_code_i){ sub_data <- df %>%filter(method_input_code == method_input_code_i & cg == site_i) %>%#arrange(varX) %>% #mutate(varX_rank = 1:nrow(.)) %>% inner_join(mortality_prob_df, by=c("pop","cg"))tibble(pearson_correlation =cor.test(sub_data$varX, sub_data$diff_p, method="pearson")$estimate[[1]],pearson_pvalue =cor.test(sub_data$varX, sub_data$diff_p, method="pearson")$p.value,spearman_correlation =cor.test(sub_data$varX, sub_data$diff_p, method="spearman")$estimate[[1]],spearman_pvalue =cor.test(sub_data$varX, sub_data$diff_p, method="spearman")$p.value,method_input_code = method_input_code_i,method_input_name =unique(sub_data$method_input_name),input_name =unique(sub_data$input_name),input_code =unique(sub_data$input_code),method =unique(sub_data$method),cg = site_i)}) %>%bind_rows()}) %>%bind_rows()saveRDS(corrtab, here("outputs/ValidationCommonGarden/MortalityModels/corrtab.rds"))
We plot the correlations.
Code
corrtab <-readRDS(here("outputs/ValidationCommonGarden/MortalityModels/corrtab.rds"))correlation_types <-c("pearson_correlation", "spearman_correlation")lapply(correlation_types, function(coeff){ p <- corrtab %>%pivot_longer(values_to ="correlation_estimate", names_to ="correlation_type", cols =all_of(correlation_types)) %>%mutate(correlation_pvalue =ifelse(correlation_type =="pearson_correlation", pearson_pvalue, spearman_pvalue)) %>% dplyr::select(-pearson_pvalue,-spearman_pvalue) %>%mutate(pvalue_signi =ifelse(correlation_pvalue <0.05, "p-value < 0.05", "p-value ≥ 0.05")) %>%filter(correlation_type == coeff) %>%mutate(cg_name =case_when(cg =="asturias"~"Asturias, Spain (37 months)", cg =="bordeaux"~"Bordeaux, France (85 months)", cg =="caceres"~"Cáceres, Spain (8 months)", cg =="madrid"~"Madrid, Spain (13 months)", cg =="portugal"~"Fundão, Portugal (27 months)")) %>%mutate(input_name =factor(input_name, levels =c(unique(corrtab$input_name)[[13]],unique(corrtab$input_name)[[11]],unique(corrtab$input_name)[[12]],unique(corrtab$input_name)[[10]],unique(corrtab$input_name)[[8]],unique(corrtab$input_name)[[9]],unique(corrtab$input_name)[[1]],unique(corrtab$input_name)[[2]],unique(corrtab$input_name)[[3]],unique(corrtab$input_name)[[4]],unique(corrtab$input_name)[[5]],unique(corrtab$input_name)[[6]],unique(corrtab$input_name)[[7]])),method =factor(method, levels =c(unique(corrtab$method)[[2]],unique(corrtab$method)[[3]],unique(corrtab$method)[[4]],unique(corrtab$method)[[5]],unique(corrtab$method)[[6]],unique(corrtab$method)[[1]])),cg_name =factor(cg_name, levels =c("Madrid, Spain (13 months)","Cáceres, Spain (8 months)","Asturias, Spain (37 months)","Bordeaux, France (85 months)","Fundão, Portugal (27 months)")))# Plots with CTD and SNP sets# ===========================p_allvar <- p %>%ggplot(aes(x = method,y = correlation_estimate,color = input_name,shape = pvalue_signi)) +geom_rect(inherit.aes=FALSE, aes(xmin=-Inf, xmax=Inf, ymin=0, ymax=Inf), color="transparent", fill="green", alpha=0.005) +geom_hline(yintercept =0,color="gray") +geom_point(position =position_dodge(width = .5), size=5) +facet_wrap(~cg_name, ncol=2) + {if(coeff=="pearson_correlation") ylab("Pearson correlation estimates")} + {if(coeff=="spearman_correlation") ylab("Spearman correlation estimates (rank correlations)") } +xlab("") +ylim(c(-0.44,0.6)) +scale_color_manual(values=colors_coeff)+scale_shape_manual(values =c(16,18)) +#, name="p-value of the correlations"theme_bw() +labs(color="",shape="") +theme(axis.text.x =element_text(size=13),axis.text.y =element_text(size=13),axis.title.y =element_text(size=16),legend.title =element_blank(), strip.text.x =element_text(size =16),strip.background =element_blank(),legend.position =c(0.77,0.15),legend.text=element_text(size=13),panel.grid.minor.x=element_blank(),panel.grid.major.x=element_blank()) +guides(color =guide_legend(ncol=1),shape =guide_legend(override.aes =list(size =2 )))p_allvar# save in pdf and pngggsave(p_allvar,file=here(paste0("figs/ValidationCommonGarden/MortalityModels/",coeff,"_SNPsandCTD.pdf")),device="pdf",height=13.6,width=15)ggsave(p_allvar,file=here(paste0("figs/ValidationCommonGarden/MortalityModels/",coeff,"_SNPsandCTD.png")),height=13.6,width=15)# Adding some images to represent the climatic differences among common gardensannotation_custom2 <-function (grob, xmin =-Inf, xmax =Inf, ymin =-Inf, ymax =Inf, data){ layer(data = data, stat = StatIdentity, position = PositionIdentity, geom = ggplot2:::GeomCustomAnn,inherit.aes =TRUE, params =list(grob = grob, xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax))}df_png <- p %>% dplyr::select(cg_name) %>% distinct %>% dplyr::mutate(image =case_when(cg_name =="Asturias, Spain (37 months)"~"reports/cloud.png", cg_name =="Bordeaux, France (85 months)"~"reports/cloud.png", cg_name =="Cáceres, Spain (8 months)"~"reports/sun.png", cg_name =="Madrid, Spain (13 months)"~"reports/sun.png", cg_name =="Fundão, Portugal (27 months)"~'reports/cloud_and_sun.png'))list_pngs <-lapply(unique(p$cg_name), function(cg_name_i){sub <- p %>%filter(cg_name==cg_name_i) %>%slice(1)png_cg =annotation_custom2(rasterGrob(readPNG(here(df_png[df_png$cg_name==cg_name_i,"image"])),interpolate=TRUE), ymin =-0.5,ymax=-0.3,xmin =0.5,xmax =1.1, data=sub)})p_allvar_images <- p_allvar + list_pngs[[1]]+ list_pngs[[2]]+ list_pngs[[3]]+ list_pngs[[4]]+ list_pngs[[5]]p_allvar_images# save in pdf and pngggsave(p_allvar_images,file=here(paste0("figs/ValidationCommonGarden/MortalityModels/",coeff,"_SNPsandCTD_WithImages.pdf")),device="pdf",height=13.6,width=15)})
Warning in geom_rect(inherit.aes = FALSE, aes(xmin = -Inf, xmax = Inf, ymin = 0, : All aesthetics have length 1, but the data has 185 rows.
ℹ Please consider using `annotate()` or provide this layer with data containing a single row.
Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, : for 'p-value ≥ 0.05' in 'mbcsToSbcs': >= substituted for ≥ (U+2265)
Warning in geom_rect(inherit.aes = FALSE, aes(xmin = -Inf, xmax = Inf, ymin = 0, : All aesthetics have length 1, but the data has 185 rows.
ℹ Please consider using `annotate()` or provide this layer with data containing a single row.
All aesthetics have length 1, but the data has 185 rows.
ℹ Please consider using `annotate()` or provide this layer with data containing a single row.
Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, : for 'p-value ≥ 0.05' in 'mbcsToSbcs': >= substituted for ≥ (U+2265)
Warning in geom_rect(inherit.aes = FALSE, aes(xmin = -Inf, xmax = Inf, ymin = 0, : All aesthetics have length 1, but the data has 185 rows.
ℹ Please consider using `annotate()` or provide this layer with data containing a single row.
Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, : for 'p-value ≥ 0.05' in 'mbcsToSbcs': >= substituted for ≥ (U+2265)
Warning in geom_rect(inherit.aes = FALSE, aes(xmin = -Inf, xmax = Inf, ymin = 0, : All aesthetics have length 1, but the data has 185 rows.
ℹ Please consider using `annotate()` or provide this layer with data containing a single row.
All aesthetics have length 1, but the data has 185 rows.
ℹ Please consider using `annotate()` or provide this layer with data containing a single row.
Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, : for 'p-value ≥ 0.05' in 'mbcsToSbcs': >= substituted for ≥ (U+2265)
In this section, we estimate the association between genomic offset (GO) predictions or climate transfer distances (CTD) and tree height, independently in the five common gardens. We compare four different mathematical models.
We first subset height measurements from the dataset (below the first five rows of the dataset are shown).
\(Y_{ipb}\) is the height of the individual \(i\) in the population \(p\) and block \(b\).
\(\sigma^{2}\) is the residual variance of the model.
\(B_b\) are the block intercepts.
\(X_p\) is the GO or CTD of the population \(p\), with \(\beta_{X1}\) and \(\beta_{X2}\) being its linear and quadratic coefficients, respectively. The quadratic term was included to allow for potential nonlinearity in the response, following Fitzpatrick et al. (2021).
S4 class stanmodel 'anon_model' coded as follows:
data {
int N; // Number of individuals
vector[N] Y; // Response variable (individual tree height)
vector[N] X; // Genomic offset or climatic transfer distances
int<lower=0> nb_bloc; // Number of blocks
int<lower=0, upper=nb_bloc> bloc[N]; // Blocks
}
parameters {
vector[nb_bloc] alpha_bloc; // Intercepts of the blocks
real beta_X1; // Linear coefficent of GO or CTD
real beta_X2; // Quadratic coefficent of GO or CTD
real<lower = 0> sigma; // Residual variance of the model
}
transformed parameters {
vector[N] mu; // Linear predictor
real R_squared; // R^2 to evaluate the goodness of fit of the model
mu = alpha_bloc[bloc] + beta_X1 * X + beta_X2 * square(X);
R_squared = 1 - variance(Y - mu) / variance(Y);
}
model {
// Likelihood
Y ~ normal(mu, sigma);
// Priors
sigma ~ exponential(1);
alpha_bloc ~ std_normal();
beta_X1 ~ std_normal();
beta_X2 ~ std_normal();
}
// generated quantities{
// // log likelihood for loo
// vector[N] log_lik;
// vector[N] muhat;
// for (n in 1:N) {
// log_lik[n] = normal_lpdf(Y[n] |mu[n],sigma);
// muhat[n] = normal_rng(mu[n], sigma);
// }
// }
lapply(unique(height_data$cg), function(site_i){lapply(unique(df$method_input_code), function(method_input_code_i){# Subset the data df_sub <- df %>%filter(method_input_code == method_input_code_i & cg == site_i) sub_data <- height_data %>%filter(cg == site_i) %>%left_join(df_sub, by =c("cg","pop"))# Data in a list for Stan stanlist <-list(N =nrow(sub_data),Y=(sub_data$height-mean(sub_data$height))/sd(sub_data$height),X=(sub_data$varX -mean(sub_data$varX))/sd(sub_data$varX),nb_bloc =length(unique(sub_data$block)),bloc =as.numeric(as.factor(sub_data$block)))# Run the models mod <-sampling(stancode, data = stanlist, iter =2000, warmup =1400, chains =4, cores =4, save_warmup =FALSE,pars = params_to_estimate) # Save the model and the stanlistlist(mod = mod, stanlist = stanlist) %>%saveRDS(file=here(paste0("outputs/ValidationCommonGarden/HeightModels/stan_models/M1/",site_i,"_",method_input_code_i,".rds"))) })})
Code
coefftab <-lapply(unique(height_data$cg),function(site_i){lapply(unique(df$method_input_code), function(method_input_code_i){# Subset the data - keeping only one set of method / SNP set sub_data <- df %>%filter(method_input_code == method_input_code_i & cg == site_i) mod <-readRDS(file=here(paste0("outputs/ValidationCommonGarden/HeightModels/stan_models/M1/", site_i,"_",method_input_code_i,".rds")))[["mod"]]# Save coefficients broom.mixed::tidyMCMC(mod,droppars =NULL, robust =FALSE, # give mean and standard deviationess = F, rhat = F, conf.int = T,conf.level =0.95) %>%#filter(str_detect(term, c('beta'))) %>% dplyr::rename(mean=estimate,std_deviation=std.error,conf_low=conf.low,conf_high=conf.high) %>%mutate(cg = site_i,method_input_code = method_input_code_i,method_input_name =unique(sub_data$method_input_name),method =unique(sub_data$method),input_name =unique(sub_data$input_name),input_code =unique(sub_data$input_code),.before=1) }) %>%bind_rows()}) %>%bind_rows()coefftab %>%saveRDS(file=here("outputs/ValidationCommonGarden/HeightModels/coefftab_M1.rds"))
4.1.3 Model coefficients
We plot the mean and 95% credible intervals of the \(\beta_{X_1}\) and \(\beta_{X_2}\) coefficients. Graph titles include the time in months corresponding to the age at which height and survival were recorded. Coefficients in the green area have the expected sign, reflecting lower height in populations with higher GO predictions.
Code
generate_interval_plots(model_i ="M1")
Joining with `by = join_by(cg, method_input_code, method_input_name, method, input_name, input_code)`
Warning in geom_rect(inherit.aes = FALSE, aes(xmin = -Inf, xmax = Inf, ymin = -Inf, : All aesthetics have length 1, but the data has 185 rows.
ℹ Please consider using `annotate()` or provide this layer with data containing a single row.
Warning in geom_rect(inherit.aes = FALSE, aes(xmin = -Inf, xmax = Inf, ymin = -Inf, : All aesthetics have length 1, but the data has 150 rows.
ℹ Please consider using `annotate()` or provide this layer with data containing a single row.
Joining with `by = join_by(cg, method_input_code, method_input_name, method, input_name, input_code)`
Joining with `by = join_by(cg, method_input_code, method_input_name, method, input_name, input_code)`
[[1]]
Warning in geom_rect(inherit.aes = FALSE, aes(xmin = -Inf, xmax = Inf, ymin = -Inf, : All aesthetics have length 1, but the data has 185 rows.
ℹ Please consider using `annotate()` or provide this layer with data containing a single row.
[[2]]
[[3]]
4.1.4 Predicted quadratic relationships
We plot the predicted quadratic relationships between tree height and GO predictions / CTD.
Code
generate_poly_plots(model_i="M1")
4.1.5 Interpretation
Higher genomic offset predictions are consistently associated with lower tree height only in Asturias. Most genomic offset predictions are also associated with lower tree height in Bordeaux.
In Fundão, Cáceres and Madrid, the association between genomic offset predictions and tree height are often in the opposite direction as expected.
Interestingly, the climatic transfer distances based in the mean annual temperature and the annual precipitation are negatively associated with tree height in all common gardens (except Cáceres for the CTD based on annual precipitation). These CTD may therefore be better predictors of tree height in common gardens than the genomic offset predictions.
Finally, I think that using tree height as a proxy for fitness to evaluate genomic offset predictions in common gardens may not be appropriate in maritime pine because this trait has a very low genetic-by-environment interaction (see previous papers). Therefore, the association between genomic offset predictions and tree height in Asturias and Bordeaux may not be due to a higher fitness of the trees that are best adapted to the climate in these common gardens, but more probably to height differences that are common across all environments (i.e. populations from Atlantic climates are generally taller). This is confirmed by the models below in which the population-specific BLUPs for height across all common gardens was included as a confounder. When included, the association between genomic offset predictions and tree height in Asturias almost disappears and either disappears or goes in the opposite direction as expected in Bordeaux. Except the genomic offset predictions based on the RDA remain negatively associated with tree height in Bordeaux and Asturias.
Therefore, I am not sure we can go very far in this validation step. We may retain that genomic offset predictions seem to show the most consistent association with tree height when generated with the RDA, and that this association is robust (though smaller) even when the fixed genetic differences across populations are included as confounder.
4.2 Model 2
4.2.1 Model equation and code
Model 2 = Model 1 + Initial height as a confounder.
\(Y_{ipb}\) is the height of the individual \(i\) in the population \(p\) and block \(b\).
\(\sigma^{2}\) is the residual variance of the model.
\(B_b\) are the block intercepts.
\(X_p\) is the GO or CTD of the population \(p\), with \(\beta_{X1}\) and \(\beta_{X2}\) being its linear and quadratic coefficients, respectively. The quadratic term was included to allow for potential nonlinearity in the response, following Fitzpatrick et al. (2021).
\(H_p\) is a proxy of the initial height of the trees from population \(p\), i.e. when trees were planted. For that, we used the population varying intercepts calculated across all common gardens in the model 1 of Archambeau et al. (2022).
S4 class stanmodel 'anon_model' coded as follows:
data {
int N;
vector[N] Y; // Response variable (individual height)
vector[N] X; // Genomic offset or climatic transfer distances
vector[N] H; // Initial height of the populations
int<lower=0> nb_bloc; // Number of blocks
int<lower=0, upper=nb_bloc> bloc[N]; // Blocks
}
parameters {
vector[nb_bloc] alpha_bloc; // Intercepts of the blocks
real beta_X1; // Linear coefficent of GO or CTD
real beta_X2; // Quadratic coefficent of GO or CTD
real beta_H; // Coefficient of the initial height of the populations
real<lower = 0> sigma; // Residual variance of the model
}
transformed parameters {
vector[N] mu; // Linear predictor
real classic_R2; // classic R2 to evaluate the goodness of fit of the model
mu = alpha_bloc[bloc] + beta_X1 * X + beta_X2 * square(X) + beta_H * H;
classic_R2 = 1 - variance(Y - mu) / variance(Y); // classical R2
}
model {
// Likelihood
Y ~ normal(mu, sigma);
// Priors
sigma ~ exponential(1);
alpha_bloc ~ std_normal();
beta_H ~ std_normal();
beta_X1 ~ std_normal();
beta_X2 ~ std_normal();
}
generated quantities{
// Bayesian R2
real bayes_R2_res; // Residual-based based R2 (uses draws from the residual distribution)
real bayes_R2_mod; // Model-based R2 (uses draws from the modeled residual variances)
bayes_R2_res = variance(mu) / (variance(mu) + variance(Y-mu));
bayes_R2_mod = variance(mu) / (variance(mu) + square(sigma));
// // log likelihood for loo
// vector[N] log_lik;
// vector[N] muhat;
// for (n in 1:N) {
// log_lik[n] = normal_lpdf(Y[n] |mu[n],sigma);
// muhat[n] = normal_rng(mu[n], sigma);
// }
}
lapply(unique(height_data$cg), function(site_i){lapply(unique(df$method_input_code), function(method_input_code_i){# Subset the data df_sub <- df %>%filter(method_input_code == method_input_code_i & cg == site_i) sub_data <- height_data %>%filter(cg == site_i) %>%left_join(df_sub, by =c("cg","pop")) %>%left_join(pop_heights %>% dplyr::select(any_of(c("height", "pop"))) %>% dplyr::rename(init_height=height), by="pop")# Data in a list for Stan stanlist <-list(N =nrow(sub_data),Y=(sub_data$height-mean(sub_data$height))/sd(sub_data$height),X=(sub_data$varX -mean(sub_data$varX))/sd(sub_data$varX),H=(sub_data$init_height -mean(sub_data$init_height))/sd(sub_data$init_height),nb_bloc =length(unique(sub_data$block)),bloc =as.numeric(as.factor(sub_data$block)))# Run the models mod <-sampling(stancode, data = stanlist, iter =2000, warmup =1400, chains =4, cores =4, save_warmup =FALSE,pars = params_to_estimate) # Save the model and the stanlistlist(mod = mod, stanlist = stanlist) %>%saveRDS(file=here(paste0("outputs/ValidationCommonGarden/HeightModels/stan_models/M2/",site_i,"_",method_input_code_i,".rds"))) })})
Code
coefftab <-lapply(unique(height_data$cg),function(site_i){lapply(unique(df$method_input_code), function(method_input_code_i){# Subset the data - keeping only one set of method / SNP set sub_data <- df %>%filter(method_input_code == method_input_code_i & cg == site_i) mod <-readRDS(file=here(paste0("outputs/ValidationCommonGarden/HeightModels/stan_models/M2/", site_i,"_",method_input_code_i,".rds")))[["mod"]]# Save coefficients broom.mixed::tidyMCMC(mod,droppars =NULL, robust =FALSE, # give mean and standard deviationess = F, rhat = F, conf.int = T,conf.level =0.95) %>%#filter(str_detect(term, c('beta'))) %>% dplyr::rename(mean=estimate,std_deviation=std.error,conf_low=conf.low,conf_high=conf.high) %>%mutate(cg = site_i,method_input_code = method_input_code_i,method_input_name =unique(sub_data$method_input_name),method =unique(sub_data$method),input_name =unique(sub_data$input_name),input_code =unique(sub_data$input_code),.before=1) }) %>%bind_rows()}) %>%bind_rows()coefftab %>%saveRDS(file=here("outputs/ValidationCommonGarden/HeightModels/coefftab_M2.rds"))
4.2.3 Proportion of variance explained
The proportion of variance explained by the height models was estimated with the classical \(\mathcal{R}^{2}\), the residual-based Bayesian \(\mathcal{R}^{2}\) and the model-based Bayesian \(\mathcal{R}^{2}\).
The classical \(\mathcal{R}^{2}\) is calculated as follows:
where \(Var_\mu\) is the variance of the modelled predictive means and \(Var_{res}\) is the residual variance.
In the residual-based Bayesian \(\mathcal{R}^{2}\), \(Var_{res}\) comes from draws from the residual distribution: \(Var_{res} = Var(Y-\mu)\). In the model-based Bayesian \(\mathcal{R}^{2}\), \(Var_{res}\) comes from the posterior quantities of the fitted model such as \(Var_{res} = \sigma^2\). As stated in , this Bayesian version of \(\mathcal{R}^{2}\) can be considered as a data-based estimate of the proportion of variance explained for new data.
In the main manuscript, we report results from the model-based Bayesian \(\mathcal{R}^{2}\).
4.2.4 Running model without predictor
Code
stancode =stan_model(here("scripts/StanModels/ValidationCommonGarden_HeightModel_M2_WithoutPredictor.stan"))params_to_estimate <-c("beta_H","sigma","alpha_bloc","classic_R2","bayes_R2_mod","bayes_R2_res")coefftab <-lapply(unique(height_data$cg), function(site_i){# Subset the data sub_data <- height_data %>%filter(cg == site_i) %>%left_join(pop_heights %>% dplyr::select(any_of(c("height", "pop"))) %>% dplyr::rename(init_height=height), by="pop")# Data in a list for Stan stanlist <-list(N =nrow(sub_data),Y=(sub_data$height-mean(sub_data$height))/sd(sub_data$height),H=(sub_data$init_height -mean(sub_data$init_height))/sd(sub_data$init_height),nb_bloc =length(unique(sub_data$block)),bloc =as.numeric(as.factor(sub_data$block)))# Run the models mod <-sampling(stancode, data = stanlist, iter =2000, chains =4, cores =4, save_warmup =FALSE,pars = params_to_estimate) # Save the models mod %>%saveRDS(file=here(paste0("outputs/ValidationCommonGarden/HeightModels/stan_models/M2/M2_",site_i,"_WithoutPredictor.rds")))# Extract coefficients broom.mixed::tidyMCMC(mod,pars=params_to_estimate,droppars =NULL, robust =FALSE, # give mean and standard deviationess = F, rhat = F, conf.int = T,conf.level =0.95) %>% dplyr::rename(mean=estimate,std_deviation=std.error,conf_low=conf.low,conf_high=conf.high) %>%mutate(cg=site_i,method="WithoutPredictor",.before=1) }) %>%bind_rows()coefftab %>%saveRDS(file=here("outputs/ValidationCommonGarden/HeightModels/coefftab_M2_WithoutPredictor.rds"))
4.2.5 Model coefficients
We plot the mean and 95% credible intervals of the \(\beta_{X_1}\), \(\beta_{X_2}\) and \(\beta_H\) coefficients. Graph titles include the time in months corresponding to the age at which height and survival were recorded. Coefficients in the green area have the expected sign, reflecting lower height in populations with higher GO predictions.
Code
generate_interval_plots(model_i ="M2")
Joining with `by = join_by(cg, method_input_code, method_input_name, method, input_name, input_code)`
Warning in geom_rect(inherit.aes = FALSE, aes(xmin = -Inf, xmax = Inf, ymin = -Inf, : All aesthetics have length 1, but the data has 185 rows.
ℹ Please consider using `annotate()` or provide this layer with data containing a single row.
All aesthetics have length 1, but the data has 185 rows.
ℹ Please consider using `annotate()` or provide this layer with data containing a single row.
Warning in geom_rect(inherit.aes = FALSE, aes(xmin = -Inf, xmax = Inf, ymin = -Inf, : All aesthetics have length 1, but the data has 150 rows.
ℹ Please consider using `annotate()` or provide this layer with data containing a single row.
Joining with `by = join_by(cg, method_input_code, method_input_name, method, input_name, input_code)`
Joining with `by = join_by(cg, method_input_code, method_input_name, method, input_name, input_code)`
[[1]]
Warning in geom_rect(inherit.aes = FALSE, aes(xmin = -Inf, xmax = Inf, ymin = -Inf, : All aesthetics have length 1, but the data has 185 rows.
ℹ Please consider using `annotate()` or provide this layer with data containing a single row.
[[2]]
[[3]]
[[4]]
[[5]]
[[6]]
4.2.6 Vizualising predicted associations
4.2.6.1 Mean height predictions
We plot the mean predicted relationships between tree height and GO predictions / CTD.
Code
generate_poly_plots(model_i ="M2")
4.2.6.2 Height predictions with uncertainty intervals
The graphs above do not show the estimate uncertainty. We generate some graphs to visualize the uncertainty around the estimates :
the uncertainty in the mean estimates (i.e. variability in posterior draws of the linear predictor \(\mu\))
the uncertainty in tree height predictions (i.e. after accounting for \(\sigma\) in the predictions).
Code
############################################################################## Visualizing the relationships between GO predictions and height predictions#############################################################################model_i <-"M2"coefftab <-readRDS(file=here(paste0("outputs/ValidationCommonGarden/HeightModels/coefftab_",model_i,".rds"))) %>%mutate(cg_name =case_when(cg =="asturias"~"Asturias, Spain (37 months)", cg =="bordeaux"~"Bordeaux, France (85 months)", cg =="caceres"~"Cáceres, Spain (8 months)", cg =="madrid"~"Madrid, Spain (13 months)", cg =="portugal"~"Fundão, Portugal (27 months)"))lapply(unique(coefftab$cg), function(cg_i){lapply(unique(coefftab$method_input_code), function(method_input_code_i){ sub_coefftab <- coefftab %>%filter(method_input_code == method_input_code_i & cg == cg_i)# Load the stanlist stanlist <-readRDS(file =here(paste0("outputs/ValidationCommonGarden/HeightModels/stan_models/M2/",cg_i,"_",method_input_code_i,".rds")))[[2]]# Loading the model mod <-readRDS(file =here(paste0("outputs/ValidationCommonGarden/HeightModels/stan_models/M2/",cg_i,"_",method_input_code_i,".rds")))[[1]]# we extract the posterior distributions of all parameters post <-as.data.frame(mod)# Vector of predictor values (based on the min and max of the GO predictions/CTD ) x_vec <-seq(min(stanlist$X),max(stanlist$X),0.05)# Predicting the linear predictor mu (predicting mean-centered height without sigma uncertainty)################################################################################################# we extract the posterior draws of mu, and its mean and HDPIs for each predictor value# Function to predict mean-centered height in block 2 and with the initial tree height fixed to the mean (i.e. fixed to zero as explanatory variables were mean-centered) f_mu <-function(x) post$`alpha_bloc[2]`+ post$`beta_X1`* x + post$`beta_X2`* x^2+ post$`beta_H`*mean(stanlist$H)#mean(sub_data$init_height) mu_pred <-sapply(x_vec, f_mu) %>% tibble::as_tibble() %>%rename_all(function(x) x_vec) %>%mutate(Iter =row_number()) %>%gather(x, y, -Iter) %>%group_by(x) %>%mutate(hpdi_l = HDInterval::hdi(y, credMass =0.90)[1],hpdi_h = HDInterval::hdi(y, credMass =0.90)[2]) %>%mutate(mu_mean =mean(y)) %>%ungroup() %>%mutate(x =as.numeric(x))# Keep mean and 90% HDPI of mu (one value for each iteration) mu_mean_df <- mu_pred %>% dplyr::select(x,mu_mean,hpdi_l,hpdi_h) %>% dplyr::distinct()# Predicting mean-centered height with sigma uncertainty########################################################## we extract the posterior draws of height predictions, and its mean and PIs for each predictor value y_pred <-sapply(x_vec,function(x)rnorm(NROW(post), post$`alpha_bloc[2]`+ post$`beta_X1`* x + post$`beta_X2`* x^2+ post$`beta_H`*mean(stanlist$H),#mean(sub_data$init_height), post$sigma)) %>%as_tibble() %>%rename_all(function(x) x_vec) %>%mutate(Iter =row_number()) %>%gather(x, y, -Iter) %>%group_by(x) %>%mutate(hpdi_l = HDInterval::hdi(y, credMass =0.90)[1],hpdi_h = HDInterval::hdi(y, credMass =0.90)[2]# pi_l = rethinking::PI(y, prob = 0.90)[1],# pi_h = rethinking::PI(y, prob = 0.90)[2] ) %>%ungroup() %>%mutate(x =as.numeric(x)) %>% dplyr::select(x,hpdi_l,hpdi_h) %>% dplyr::distinct()# Plots######## Plot options y_limits <-c(-3,3)if(unique(sub_coefftab$method =="CTD")){x_axis <-"Mean-centered CTD"} else { x_axis <-"Mean-centered GO predictions"} y_axis <-"Mean-centered tree height" p <-ggplot() +ylim(y_limits) +labs(y=y_axis, x=x_axis) +theme_bw()# First 100 posterior draws of the linear predictor mu p1 <- p +geom_point(data = mu_pred %>%filter(Iter <101),aes(x, y), alpha = .2, color ='dodgerblue')# Mean (line) and 90% HPDI (shaded region) of the linear predictor mu p2 <- p +geom_ribbon(data = y_pred,mapping =aes(x=x, ymin=hpdi_l, ymax=hpdi_h), alpha = .1, fill='dodgerblue') +geom_ribbon(data = mu_mean_df,aes(x = x, ymin = hpdi_l, ymax = hpdi_h),alpha = .2,fill='dodgerblue') +geom_line(data = mu_mean_df,aes(x=x, y=mu_mean), col="dodgerblue4") p1_p2 <-ggarrange(p1, p2 +labs(y=""), nrow =1)# Title title <-ggdraw() +draw_label(paste0(unique(sub_coefftab$cg_name)," - ",unique(sub_coefftab$method_input_name)),fontface ='bold',x =0, hjust =0) +theme(plot.margin =margin(0, 0, 0, 7))# merge title and plots p1_p2 <-plot_grid(title, p1_p2, ncol =1, rel_heights =c(0.1, 1))ggsave(p1_p2,file=here(paste0("figs/ValidationCommonGarden/HeightModels/HeightPredictions_MuPosteriorDraws_UncertaintyIntervals/",model_i,"_",cg_i,"_",method_input_code_i,".pdf")),device="pdf",height=6,width=11) }) })
lapply(unique(height_data$cg), function(site_i){# Subset the data sub_data <- height_data %>%filter(cg == site_i) %>%left_join(pop_heights %>% dplyr::select(any_of(c("height", "pop"))) %>% dplyr::rename(init_height=height), by="pop")# Data in a list for Stan stanlist <-list(N =nrow(sub_data),Y=(sub_data$height-mean(sub_data$height))/sd(sub_data$height),H=(sub_data$init_height -mean(sub_data$init_height))/sd(sub_data$init_height),nb_bloc =length(unique(sub_data$block)),bloc =as.numeric(as.factor(sub_data$block)))# Run the models mod <-sampling(stancode, data = stanlist, iter =2000, chains =4, cores =4, save_warmup =FALSE,warmup =1400,pars ="y_pred")# Save the stanlist and the modelsaveRDS(list(data=sub_data, mod=mod),file=here(paste0("outputs/ValidationCommonGarden/HeightModels/stan_models/M2/M2_WithoutPredictorWithPredictions_",site_i,".rds")))})
corrtab <-readRDS(here("outputs/ValidationCommonGarden/HeightModels/corrtab_M2.rds"))correlation_types <-c("pearson_correlation", "spearman_correlation")lapply(correlation_types, function(coeff){ p <- corrtab %>%pivot_longer(values_to ="correlation_estimate", names_to ="correlation_type", cols =all_of(correlation_types)) %>%mutate(correlation_pvalue =ifelse(correlation_type =="pearson_correlation", pearson_pvalue, spearman_pvalue)) %>% dplyr::select(-pearson_pvalue,-spearman_pvalue) %>%mutate(pvalue_signi =ifelse(correlation_pvalue <0.05, "p-value < 0.05", "p-value ≥ 0.05")) %>%filter(correlation_type == coeff) %>%mutate(cg_name =case_when(cg =="asturias"~"Asturias, Spain (37 months)", cg =="bordeaux"~"Bordeaux, France (85 months)", cg =="caceres"~"Cáceres, Spain (8 months)", cg =="madrid"~"Madrid, Spain (13 months)", cg =="portugal"~"Fundão, Portugal (27 months)")) %>%mutate(input_name =factor(input_name, levels =c(unique(corrtab$input_name)[[13]],unique(corrtab$input_name)[[11]],unique(corrtab$input_name)[[12]],unique(corrtab$input_name)[[10]],unique(corrtab$input_name)[[8]],unique(corrtab$input_name)[[9]],unique(corrtab$input_name)[[1]],unique(corrtab$input_name)[[2]],unique(corrtab$input_name)[[3]],unique(corrtab$input_name)[[4]],unique(corrtab$input_name)[[5]],unique(corrtab$input_name)[[6]],unique(corrtab$input_name)[[7]])),method =factor(method, levels =c(unique(corrtab$method)[[2]],unique(corrtab$method)[[3]],unique(corrtab$method)[[4]],unique(corrtab$method)[[5]],unique(corrtab$method)[[6]],unique(corrtab$method)[[1]])),cg_name =factor(cg_name, levels =c("Madrid, Spain (13 months)","Cáceres, Spain (8 months)","Asturias, Spain (37 months)","Bordeaux, France (85 months)","Fundão, Portugal (27 months)")))# Plots with CTD and SNP sets# ===========================p_allvar <- p %>%ggplot(aes(x = method,y = correlation_estimate,color = input_name,shape = pvalue_signi)) +geom_rect(inherit.aes=FALSE, aes(xmin=-Inf, xmax=Inf, ymin=-Inf, ymax=0), color="transparent", fill="green", alpha=0.005) +geom_hline(yintercept =0,color="gray") +geom_point(position =position_dodge(width = .5), size=5) +facet_wrap(~cg_name, ncol=2) + {if(coeff=="pearson_correlation") ylab("Pearson correlation estimates")} + {if(coeff=="spearman_correlation") ylab("Spearman correlation estimates (rank correlation)") } +xlab("") +#ylim(c(-0.44,0.55)) +scale_color_manual(values=colors_coeff)+scale_shape_manual(values =c(16,18)) +#, name="p-value of the correlations"theme_bw() +labs(color="",shape="") +theme(axis.text.x =element_text(size=13),axis.text.y =element_text(size=13),axis.title.y =element_text(size=16),legend.title =element_blank(), strip.text.x =element_text(size =16),strip.background =element_blank(),legend.position =c(0.77,0.15),legend.text=element_text(size=13),panel.grid.minor.x=element_blank(),panel.grid.major.x=element_blank()) +guides(color =guide_legend(ncol=1),shape =guide_legend(override.aes =list(size =2 )))p_allvar# save in pdf and pngggsave(p_allvar,file=here(paste0("figs/ValidationCommonGarden/HeightModels/Correlations/",coeff,"_SNPsandCTD.pdf")),device="pdf",height=13.6,width=15)ggsave(p_allvar,file=here(paste0("figs/ValidationCommonGarden/HeightModels/Correlations/",coeff,"_SNPsandCTD.png")),height=13.6,width=15)# Adding some images to represent the climatic differences among common gardensannotation_custom2 <-function (grob, xmin =-Inf, xmax =Inf, ymin =-Inf, ymax =Inf, data){ layer(data = data, stat = StatIdentity, position = PositionIdentity, geom = ggplot2:::GeomCustomAnn,inherit.aes =TRUE, params =list(grob = grob, xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax))}df_png <- p %>% dplyr::select(cg_name) %>% distinct %>% dplyr::mutate(image =case_when(cg_name =="Asturias, Spain (37 months)"~"reports/cloud.png", cg_name =="Bordeaux, France (85 months)"~"reports/cloud.png", cg_name =="Cáceres, Spain (8 months)"~"reports/sun.png", cg_name =="Madrid, Spain (13 months)"~"reports/sun.png", cg_name =="Fundão, Portugal (27 months)"~'reports/cloud_and_sun.png'))list_pngs <-lapply(unique(p$cg_name), function(cg_name_i){sub <- p %>%filter(cg_name==cg_name_i) %>%slice(1)png_cg =annotation_custom2(rasterGrob(readPNG(here(df_png[df_png$cg_name==cg_name_i,"image"])),interpolate=TRUE), ymin =-0.65,ymax=-0.45,xmin =0.5,xmax =1.1, data=sub)})p_allvar_images <- p_allvar + list_pngs[[1]]+ list_pngs[[2]]+ list_pngs[[3]]+ list_pngs[[4]]+ list_pngs[[5]]p_allvar_images# save in pdf and pngggsave(p_allvar_images,file=here(paste0("figs/ValidationCommonGarden/HeightModels/Correlations/",coeff,"_SNPsandCTD_WithImages.pdf")),device="pdf",height=13.6,width=15)})
Warning in geom_rect(inherit.aes = FALSE, aes(xmin = -Inf, xmax = Inf, ymin = -Inf, : All aesthetics have length 1, but the data has 185 rows.
ℹ Please consider using `annotate()` or provide this layer with data containing a single row.
Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, : for 'p-value ≥ 0.05' in 'mbcsToSbcs': >= substituted for ≥ (U+2265)
Warning in geom_rect(inherit.aes = FALSE, aes(xmin = -Inf, xmax = Inf, ymin = -Inf, : All aesthetics have length 1, but the data has 185 rows.
ℹ Please consider using `annotate()` or provide this layer with data containing a single row.
All aesthetics have length 1, but the data has 185 rows.
ℹ Please consider using `annotate()` or provide this layer with data containing a single row.
Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, : for 'p-value ≥ 0.05' in 'mbcsToSbcs': >= substituted for ≥ (U+2265)
Warning in geom_rect(inherit.aes = FALSE, aes(xmin = -Inf, xmax = Inf, ymin = -Inf, : All aesthetics have length 1, but the data has 185 rows.
ℹ Please consider using `annotate()` or provide this layer with data containing a single row.
Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, : for 'p-value ≥ 0.05' in 'mbcsToSbcs': >= substituted for ≥ (U+2265)
Warning in geom_rect(inherit.aes = FALSE, aes(xmin = -Inf, xmax = Inf, ymin = -Inf, : All aesthetics have length 1, but the data has 185 rows.
ℹ Please consider using `annotate()` or provide this layer with data containing a single row.
All aesthetics have length 1, but the data has 185 rows.
ℹ Please consider using `annotate()` or provide this layer with data containing a single row.
Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, : for 'p-value ≥ 0.05' in 'mbcsToSbcs': >= substituted for ≥ (U+2265)
\(Y_{ipb}\) is the height of the individual \(i\) in the population \(p\) and block \(b\).
\(\sigma^{2}\) is the residual variance of the model.
\(B_b\) are the block intercepts.
\(X_p\) is the GO or CTD of the population \(p\), with \(\beta_{X1}\) and \(\beta_{X2}\) being its linear and quadratic coefficients, respectively. The quadratic term was included to allow for potential nonlinearity in the response, following Fitzpatrick et al. (2021).
\(H_p\) is a proxy of the initial height of the trees from population \(p\), i.e. when trees were planted. For that, we used the population varying intercepts calculated across all common gardens in the model 1 of Archambeau et al. (2022).
S4 class stanmodel 'anon_model' coded as follows:
data {
int N;
vector[N] Y; // Response variable (individual height)
vector[N] X; // Genomic offset or climatic transfer distances
vector[N] H; // Initial height of the populations
int<lower=0> nb_bloc; // Number of blocks
int<lower=0, upper=nb_bloc> bloc[N]; // Blocks
int<lower=0> nb_clon; // Number of clones
int<lower=0, upper=nb_clon> clon[N]; // Clones
}
parameters {
vector[nb_bloc] alpha_bloc; // Intercepts of the blocks
vector[nb_clon] alpha_clon; // Intercepts of the clones
real beta_0; // Global intercept
real beta_X1; // Linear coefficent of GO or CTD
real beta_X2; // Quadratic coefficent of GO or CTD
real beta_H; // Linear coefficient of the initial height of the populations
real<lower = 0> sigma; // Residual variance of the model
}
transformed parameters {
vector[N] mu; // Linear predictor
real R_squared; // R^2 to evaluate the goodness of fit of the model
mu = beta_0 + alpha_bloc[bloc] + alpha_clon[clon] + beta_X1 * X + beta_X2 * square(X) + beta_H * H;
R_squared = 1 - variance(Y - mu) / variance(Y);
}
model {
// Likelihood
Y ~ normal(mu, sigma);
// Priors
sigma ~ exponential(1);
alpha_bloc ~ std_normal();
alpha_clon ~ std_normal();
beta_0 ~ std_normal();
beta_X1 ~ std_normal();
beta_X2 ~ std_normal();
beta_H ~ std_normal();
}
// generated quantities{
// // log likelihood for loo
// vector[N] log_lik;
// vector[N] muhat;
// for (n in 1:N) {
// log_lik[n] = normal_lpdf(Y[n] |mu[n],sigma);
// muhat[n] = normal_rng(mu[n], sigma);
// }
// }
lapply(unique(height_data$cg), function(site_i){lapply(unique(df$method_input_code), function(method_input_code_i){# Subset the data df_sub <- df %>%filter(method_input_code == method_input_code_i & cg == site_i) sub_data <- height_data %>%filter(cg == site_i) %>%left_join(df_sub, by =c("cg","pop")) %>%left_join(pop_heights %>% dplyr::select(any_of(c("height", "pop"))) %>% dplyr::rename(init_height=height), by="pop")# Data in a list for Stan stanlist <-list(N =nrow(sub_data),Y=(sub_data$height-mean(sub_data$height))/sd(sub_data$height),X=(sub_data$varX -mean(sub_data$varX))/sd(sub_data$varX),H=(sub_data$init_height -mean(sub_data$init_height))/sd(sub_data$init_height),nb_bloc =length(unique(sub_data$block)),nb_clon =length(unique(sub_data$clon)),bloc =as.numeric(as.factor(sub_data$block)),clon =as.numeric(as.factor(sub_data$clon)))# Run the models mod <-sampling(stancode, data = stanlist, iter =2000, warmup =1400, chains =4, cores =4, save_warmup =FALSE,pars = params_to_estimate) # Save the model and the stanlistlist(mod = mod, stanlist = stanlist) %>%saveRDS(file=here(paste0("outputs/ValidationCommonGarden/HeightModels/stan_models/M3/",site_i,"_",method_input_code_i,".rds"))) })})
Warning message: “Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable. Running the chains for more iterations may help. See https://mc-stan.org/misc/warnings.html#bulk-ess”
Code
coefftab <-lapply(unique(height_data$cg),function(site_i){lapply(unique(df$method_input_code), function(method_input_code_i){# Subset the data - keeping only one set of method / SNP set sub_data <- df %>%filter(method_input_code == method_input_code_i & cg == site_i) mod <-readRDS(file=here(paste0("outputs/ValidationCommonGarden/HeightModels/stan_models/M3/", site_i,"_",method_input_code_i,".rds")))[["mod"]]# Save coefficients broom.mixed::tidyMCMC(mod,droppars =NULL, robust =FALSE, # give mean and standard deviationess = F, rhat = F, conf.int = T,conf.level =0.95) %>%#filter(str_detect(term, c('beta'))) %>% dplyr::rename(mean=estimate,std_deviation=std.error,conf_low=conf.low,conf_high=conf.high) %>%mutate(cg = site_i,method_input_code = method_input_code_i,method_input_name =unique(sub_data$method_input_name),method =unique(sub_data$method),input_name =unique(sub_data$input_name),input_code =unique(sub_data$input_code),.before=1) }) %>%bind_rows()}) %>%bind_rows()coefftab %>%saveRDS(file=here("outputs/ValidationCommonGarden/HeightModels/coefftab_M3.rds"))
4.3.3 Model coefficients
We plot the mean and 95% credible intervals of the \(\beta_{X_1}\), \(\beta_{X_2}\) and \(\beta_H\) coefficients. Graph titles include the time in months corresponding to the age at which height and survival were recorded. Coefficients in the green area have the expected sign, reflecting lower height in populations with higher GO predictions.
Code
generate_interval_plots(model_i ="M3")
Joining with `by = join_by(cg, method_input_code, method_input_name, method, input_name, input_code)`
Warning in geom_rect(inherit.aes = FALSE, aes(xmin = -Inf, xmax = Inf, ymin = -Inf, : All aesthetics have length 1, but the data has 185 rows.
ℹ Please consider using `annotate()` or provide this layer with data containing a single row.
Warning in geom_rect(inherit.aes = FALSE, aes(xmin = -Inf, xmax = Inf, ymin = -Inf, : All aesthetics have length 1, but the data has 150 rows.
ℹ Please consider using `annotate()` or provide this layer with data containing a single row.
Joining with `by = join_by(cg, method_input_code, method_input_name, method, input_name, input_code)`
Joining with `by = join_by(cg, method_input_code, method_input_name, method, input_name, input_code)`
Joining with `by = join_by(cg, method_input_code, method_input_name, method, input_name, input_code)`
[[1]]
Warning in geom_rect(inherit.aes = FALSE, aes(xmin = -Inf, xmax = Inf, ymin = -Inf, : All aesthetics have length 1, but the data has 185 rows.
ℹ Please consider using `annotate()` or provide this layer with data containing a single row.
[[2]]
[[3]]
[[4]]
4.3.4 Predicted quadratic relationships
We plot the predicted quadratic relationships between tree height and GO predictions / CTD.
Code
generate_poly_plots(model_i ="M3")
4.4 Model 4
4.4.1 Model equation and code
Model 4 = Model 2 + Initial height as a confounder + Clone varying intercepts
\(Y_{ipb}\) is the height of the individual \(i\) in the population \(p\) and block \(b\).
\(\sigma^{2}\) is the residual variance of the model.
\(B_b\) are the block intercepts.
\(X_p\) is the GO or CTD of the population \(p\), with \(\beta_{X1}\) and \(\beta_{X2}\) being its linear and quadratic coefficients, respectively. The quadratic term was included to allow for potential nonlinearity in the response, following Fitzpatrick et al. (2021).
\(H_p\) is a proxy of the initial height of the trees from population \(p\), i.e. when trees were planted. For that, we used the population varying intercepts calculated across all common gardens in the model 1 of Archambeau et al. (2022).
\(\beta_0\) is the model global intercept.
\(C_c\) are the clone varying intercepts which follow a normal distribution of variance \(\sigma_C^2\).
S4 class stanmodel 'anon_model' coded as follows:
data {
int N;
vector[N] Y; // Response variable (individual height)
vector[N] X; // Genomic offset or climatic transfer distances
vector[N] H; // Initial height of the populations
int<lower=0> nb_bloc; // Number of blocks
int<lower=0, upper=nb_bloc> bloc[N]; // Blocks
int<lower=0> nb_clon; // Number of clones
int<lower=0, upper=nb_clon> clon[N]; // Clones
}
parameters {
vector[nb_bloc] alpha_bloc; // Intercepts of the blocks
real beta_0; // Global intercept
real beta_X1; // Linear coefficent of GO or CTD
real beta_X2; // Quadratic coefficent of GO or CTD
real beta_H; // Coefficient of the initial height of the populations
real<lower = 0> sigma; // Residual variance of the model
real<lower = 0> sigma_clon; // Variance of the clone intercepts
vector[nb_clon] z_clon; // Vector for non-centered parameterization
}
transformed parameters {
vector[N] mu; // Linear predictor
real R_squared; // R^2 to evaluate the goodness of fit of the model
vector[nb_clon] alpha_clon; // Intercepts of the clones
alpha_clon = z_clon*sigma_clon;
mu = beta_0 + alpha_bloc[bloc] + alpha_clon[clon] + beta_X1 * X + beta_X2 * square(X) + beta_H * H;
R_squared = 1 - variance(Y - mu) / variance(Y);
}
model {
// Likelihood
Y ~ normal(mu, sigma);
// Priors
beta_0 ~ normal(mean(Y),2);
sigma ~ exponential(1);
sigma_clon ~ exponential(1);
alpha_bloc ~ std_normal();
z_clon ~ std_normal();
beta_H ~ std_normal();
beta_X1 ~ std_normal();
beta_X2 ~ std_normal();
}
// generated quantities{
// // log likelihood for loo
// vector[N] log_lik;
// vector[N] muhat;
// for (n in 1:N) {
// log_lik[n] = normal_lpdf(Y[n] |mu[n],sigma);
// muhat[n] = normal_rng(mu[n], sigma);
// }
// }
lapply(unique(height_data$cg), function(site_i){lapply(unique(df$method_input_code), function(method_input_code_i){# Subset the data df_sub <- df %>%filter(method_input_code == method_input_code_i & cg == site_i) sub_data <- height_data %>%filter(cg == site_i) %>%left_join(df_sub, by =c("cg","pop")) %>%left_join(pop_heights %>% dplyr::select(any_of(c("height", "pop"))) %>% dplyr::rename(init_height=height), by="pop")# Data in a list for Stan stanlist <-list(N =nrow(sub_data),Y=(sub_data$height-mean(sub_data$height))/sd(sub_data$height),X=(sub_data$varX -mean(sub_data$varX))/sd(sub_data$varX),H=(sub_data$init_height -mean(sub_data$init_height))/sd(sub_data$init_height),nb_bloc =length(unique(sub_data$block)),nb_clon =length(unique(sub_data$clon)),bloc =as.numeric(as.factor(sub_data$block)),clon =as.numeric(as.factor(sub_data$clon)))# Run the models mod <-sampling(stancode, data = stanlist, iter =2000, warmup =1400, chains =4, cores =4, save_warmup =FALSE,pars = params_to_estimate) # Save the model and the stanlistlist(mod = mod, stanlist = stanlist) %>%saveRDS(file=here(paste0("outputs/ValidationCommonGarden/HeightModels/stan_models/M4/",site_i,"_",method_input_code_i,".rds"))) })})
Warning message: “Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable. Running the chains for more iterations may help. See https://mc-stan.org/misc/warnings.html#bulk-ess”
Code
coefftab <-lapply(unique(height_data$cg),function(site_i){lapply(unique(df$method_input_code), function(method_input_code_i){# Subset the data - keeping only one set of method / SNP set sub_data <- df %>%filter(method_input_code == method_input_code_i & cg == site_i) mod <-readRDS(file=here(paste0("outputs/ValidationCommonGarden/HeightModels/stan_models/M4/", site_i,"_",method_input_code_i,".rds")))[["mod"]]# Save coefficients broom.mixed::tidyMCMC(mod,droppars =NULL, robust =FALSE, # give mean and standard deviationess = F, rhat = F, conf.int = T,conf.level =0.95) %>% dplyr::rename(mean=estimate,std_deviation=std.error,conf_low=conf.low,conf_high=conf.high) %>%mutate(cg = site_i,method_input_code = method_input_code_i,method_input_name =unique(sub_data$method_input_name),method =unique(sub_data$method),input_name =unique(sub_data$input_name),input_code =unique(sub_data$input_code),.before=1) }) %>%bind_rows()}) %>%bind_rows()coefftab %>%saveRDS(file=here("outputs/ValidationCommonGarden/HeightModels/coefftab_M4.rds"))
4.4.2 Model coefficients
We plot the mean and 95% credible intervals of the \(\beta_{X_1}\), \(\beta_{X_2}\) and \(\beta_H\) coefficients. Graph titles include the time in months corresponding to the age at which height and survival were recorded. Coefficients in the green area have the expected sign, reflecting lower height in populations with higher GO predictions.
Code
generate_interval_plots(model_i ="M4")
Joining with `by = join_by(cg, method_input_code, method_input_name, method, input_name, input_code)`
Warning in geom_rect(inherit.aes = FALSE, aes(xmin = -Inf, xmax = Inf, ymin = -Inf, : All aesthetics have length 1, but the data has 185 rows.
ℹ Please consider using `annotate()` or provide this layer with data containing a single row.
Warning in geom_rect(inherit.aes = FALSE, aes(xmin = -Inf, xmax = Inf, ymin = -Inf, : All aesthetics have length 1, but the data has 150 rows.
ℹ Please consider using `annotate()` or provide this layer with data containing a single row.
Joining with `by = join_by(cg, method_input_code, method_input_name, method, input_name, input_code)`
Joining with `by = join_by(cg, method_input_code, method_input_name, method, input_name, input_code)`
Joining with `by = join_by(cg, method_input_code, method_input_name, method, input_name, input_code)`
[[1]]
Warning in geom_rect(inherit.aes = FALSE, aes(xmin = -Inf, xmax = Inf, ymin = -Inf, : All aesthetics have length 1, but the data has 185 rows.
ℹ Please consider using `annotate()` or provide this layer with data containing a single row.
[[2]]
[[3]]
[[4]]
4.4.3 Predicted quadratic relationships
We plot the predicted quadratic relationships between tree height and GO predictions / CTD.
Code
generate_poly_plots(model_i ="M4")
4.5 Sample size for each clone
We look at the number of height measurements for each clone. In the tables below (one for each common garden), the first column correspond to the number of height measurements and the second column correspond to the number of clones with this number of measurements.
Archambeau, Juliette, Marta Benito Garzón, Frédéric Barraquand, Marina de Miguel, Christophe Plomion, and Santiago C González-Martı́nez. 2022. “Combining Climatic and Genomic Data Improves Range-Wide Tree Height Growth Prediction in a Forest Tree.”The American Naturalist 200 (4): E141–59. https://www.journals.uchicago.edu/doi/abs/10.1086/720619.
Fitzpatrick, Matthew C, Vikram E Chhatre, Raju Y Soolanayakanahally, and Stephen R Keller. 2021. “Experimental Support for Genomic Prediction of Climate Maladaptation Using the Machine Learning Approach Gradient Forests.”Molecular Ecology Resources. https://onlinelibrary.wiley.com/doi/abs/10.1111/1755-0998.13374.